Enabling Scientists to Understand their Data using Web-Based Statistical Tools
Eric Hare
About Me
- Graduated in 2008 from Snohomish High School in Snohomish, WA (Hour Northeast of Seattle)
- Bachelor of Science with a major in Statistics (University of Washington, June 2012)
- Bachelor of Science in Computer Engineering (University of Washington, June 2012)
- Began graduate studies at Iowa State University in August 2012
- Master of Science in Statistics (Iowa State University, 2014)
- Declared Co-Major in Statistics and Computer Science, 2014
- Plan to graduate May 2017 and join Omni Analytics Group as Executive Data Scientist
Publications
- Hare, E., Hofmann, H., and Carriquiry, A., Automatic Matching of Bullet Lands. Accepted by AoAS
- Hare, E., and Kaplan, A., Introductory Statistics with intRo. Accepted by JCGS
- Hare, E., Buja, A., and Hofmann, H., Manipulation of Discrete Random Variables with discreteRV. R Journal Vol. 7 Iss. 1 (2015)
- Sieber, T., Hare, E., Hofmann, H., and Treppel, M., Biomathematical description of synthetic Peptide libraries. PLOS One Vol. 10 Iss. 6 (2015)
- Kaplan, A., and Hare, E., Putting Down Roots: A Graphical Exploration of Community Attachment (Accepted in Computational Statistics)
- Kaplan, A., Hare, E., Hofmann, H., and Cook, D., Can you buy a president? Politics after the Tillman Act. CHANCE Vol. 27 Iss. 1 (2014)
Awards
- ASA Imaging Section Student Paper Award, Automatic Matching of Bullet Lands, Received 2016
- Ron Wasserstein Award for Best Contributed Paper in Statistical Education, Introductory Statistics with intRo, Received 2015
- ASA Computing Section Student Paper Award, Introductory Statistics with intRo, Received 2015
- Kempthorne Award for Best Student in Linear Models, Received 2013
- ASA Data Exposition - 1st Place, Received 2013
Dissertation Outline
Working Title: Enabling Scientists to Understand their Data using Web-Based Statistical Tools
Papers:
- Automatic Matching of Bullet Lands (Accepted with Revisions in AoAS)
- Designing Modular Software: A Case Study in Introductory Statistics (Accepted with Revisions in JCGS)
- Visual Inference (In Progress)
Common Themes
- Reproducible Research
- Interactive Graphics
- Exploratory Data Analysis
- Three-pronged approach:
- Statistical Component (Algorithm, optimization, lineup protocol)
- Data Science Component (Open-source R packages)
- Web-Based Component (Interface for non-developers)
Automatic Matching of Bullet Lands

Eric Hare, Heike Hofmann, Alicia Carriquiry
Center for Statistics and Applications in Forensic Evidence (CSAFE)
Goal
- We wish to determine whether two bullets were fired from the same gun barrel
- Striation patterns, or individual characteristics, are unique to barrels and fairly stable (Xie 2009)
- Forensic Examiners shoot bullets from guns of suspects, compare it to bullet from crime scene

Current Practice
- Traditionally, the bullets are placed under comparison microscopes, are manually aligned by forensic scientists and compared
- (Arbitrary) Thresholds have been established as “standard practice” (e.g., more than 6 consecutively matching striae to declare a match, Nichols 2003)
- However, this has traditionally been less than statistical and such practices have come under fire in the courtroom (Giannelli, 2011)
The problems culminated in a 2009 NAS report which found “much forensic evidence – including, for example, bite marks and firearm and toolmark identification is introduced in criminal trials without any meaningful scientific validation, determination of error rates, or reliability testing.”
More Problems
A timely report was released on Tuesday by the President’s Council of Advisors on Science and Technology (PCAST) titled Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods.
A second—and more important—direction is (as with latent print analysis)
to convert firearms analysis from a subjective method to an objective
method. This would involve developing and testing image-analysis
algorithms for comparing the similarity of tool marks on bullets. [...]
In a recent study, researchers used images from an earlier study to
develop a computer-assisted approach to match bullets that minimizes
human input [338].
338: Hare, E., Hofmann, H., and A. Carriquiry. “Automatic matching of bullet lands.” Unpublished paper, available at: arxiv.org/pdf/1601.05788v2.pdf.
James Hamby Study
- Ten consecutively rifled Ruger P-85 pistol barrels used to fire 20 “known” test bullets and 15 “unknown” bullets for comparison
- Sets of 35 bullets sent to 507 Forensic Examiners for examination
- 0 false positive identifications, 8 inconclusive results (out of 7605)
- Can an automated algorithm do as well?
plot3D.x3p.file(read.x3p("~/GitHub/imaging-paper/
app/images/Hamby252_3DX3P1of2/Br1 Bullet 1-5.x3p"),
plot.type = "surface")
Step One (Continued)
br111 <- get_crosscut("images/Br1 Bullet 1-5.x3p", x = 243.75)
qplot(y, value, data = br111) + theme_bw()

Step Two: Remove Shoulders
The striations that identify a bullet to a gun barrel are located in the land impression areas (Xie 2009).
- At a fixed height x extract a bullet’s profile (previous figure, with x = 243.75μm).
- For each y value, smooth out any deviations occurring near the minima by applying a rolling average with a pre-set s.
- For each smoothed y value, compute another rolling average using the same smoothing factor s as above.
- Determine the location of the peak of the shoulders by finding the first and last doubly-smoothed value yi that is the maximum within its smoothing window.
Parameters: s = 35μm
Identifying Shoulders (Easy)
br111.groove <- get_grooves(br111)
br111.groove$plot

Identifying Shoulders (Challenging)
result2 <- get_grooves(get_crosscut("~/GitHub/imaging-paper/app/images/Hamby252_3DX3P1of2/Br1 Bullet 1-6.x3p"))
result2$plot

Step Three: Fit Loess Regression
Local weighted scatterplot smoothing (Cleveland, 1979) - Fits a low-degree polynomial to a small subset of the data, weighting values near the point to be estimated more strongly.
br111.loess <- fit_loess(br111, br111.groove)
br111.loess$fitted

Step Four: Get the Residuals
Deviations from the loess fit should represent the imperfections (striations) on the bullet. Hence, we extract the residuals from the model.

Step Five: Peaks and Valleys
As with detecting the shoulders, we can smooth the deviations and compute derivatives to identify peaks and valleys in the signature.
br111.peaks <- get_peaks(br111.loess$data)
br111.peaks$plot

Step Six: Bullet Alignment
The previous five steps are performed for each bullet land. But now we wish to extract features for cross comparisons of bullet lands.

Step Six (Continued)

Distribution of Features

Step Seven: Random Forest

Feature Importance

Future Work
- We are extremely limited by data here - Currently we’ve only assessed this algorithm on one particular set of 35 bullets (New data is on the way, however)
- There is likely an effect from the equipment used to scan the bullets, or the person performing the scan, and this should be accounted for
- We need to assess the quality of the algorithm in case we don’t have full bullet lands available, as is the case in many forensics applications (degraded bullets)
- This future work will be part of a follow-up paper, in progress.
Designing Modular Software: A Case Study in Introductory Statistics
- Paper on intRo, a modular, extensible, web-based software system for teaching introductory statistics, which produces reproducible R code resulting from actions taken in the GUI.
- Focuses on the construction of a modulary software system using intRo itself as a case-study.
- Accepted with revisions in the Journal for Computational and Graphical Statistics.
Dissertation Chapter Three
Visual Inference
- The lineup protocol (Buja, 2009) is an inferential framework which acts as an exploratory data analysis corollary to traditional hypothesis testing.
- Key Idea: recognizing the parallel between discoveries in a graphical display and rejection of a null hypothesis in a traditional hypothesis test.
- The concept extends to null plots, which represent the visual inference corollary to null distributions in traditional hypothesis testing.
- Null plots are a possibly infinite set of displays that are randomly generated by sampling from the null hypothesis.
- By placing a target plot within a set of m null plots and asking observers to identify the “most different” plot, a visual inference test can be conducted
User-facing Side
A modernization of the framework to run lineup experiments and identify the most different plot in a set of null plots is in development.

Experimenter-facing Side
In the early stages is a new service to enable researchers to automatically conduct and run lineup experiments, while allowing for the specification of the randomization scheme, stratification factors, and parameters of the study

Deliverables Timeline

Thank You
Special thanks to Alan Zheng at the National Institute of Standards and Technology for maintaining the NIST Ballistics Toolmark Research Database and providing many useful suggestions for our algorithm.
Any Questions?